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(57) ABSTRACT 

The invention is directed to the transformation of software 
loops having early exit conditions, thereby allowing the 
loops to be more effectively converted to a single basic block 
for software pipelining. The invention assigns a predicate 
register for each early exit condition of the software loop. 
The predicate registers are set when the corresponding early 
exit condition is satisfied. In this manner, when the loop 
terminates the predicate registers can be examined to indi- 
cate which early exit conditions were satisfied. The inven- 
tion produces loops having a lower recurrence II and 
resource II than conventional techniques. 
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EARLY EXIT TRANSFORMATIONS FOR FIG. 1 illustrates a computing system 1000 that represents 

SOFTWARE PIPELINING *ny general purpose computing device having various inter- 

nal computing components including CPU 1010, read-only 
memory (ROM) 1015, random-access memory (RAM) 
TECHNICAL FIELD $ 1020, and one or more busses 1025 that operatively couple 

Ibis invention relates generally to the field of computing the components. TTiere may be only one processing unit, 
environments and, more particularly, to a method of trans- such that computing system 1000 comprises a smgle central- 
forming software loops having early exits. processing unit (CPU), or a p urahty of processing units, 

commonly referred to as a parallel processing environment. 
BACKGROUND INFORMATION 10 Basic input/output system (BIOS) 1035 contains all code 

. . , required to control basic devices including a keyboard, a 

In order to accelerate the processing of data, many high- ^ { gc disk ^ OTmmuni catioiis, etc. 

performance computing systems overlap the execution of _ ^ nnn _ , , ^ j ^ , . 

loop iterations using a technique called software pipelining. , 8 Sy 1000 , furth " k. ^ fn/nT^ 

This improves the utilization of available hardware „ 1040 for computer-readable medium 1050 that 

resources by increasing instruction-level parallelism. The 15 represents any storage medium, such as a d^k-shaped data 
task of software pipelining is simplified when the loop » to ?W» med ™ m > f ° r ^° ldlng *f al \nfonnataon. Computer- 
consists of a single basic block that has a single loop exit. readable medium 1050 may be an internal hard disk or a 
Thus, in order to generate code that can be software removable data storage device such as a floppy diskette, a 

pipelined, compilers strive to transform loops that have „ ma ^t P i ™ ^ a t S "P er ^ k ™ disk ^ tte .' a 

ii j i 20 Zip™ disk, a Jaz™ disk, a tape cartridge etc. Storage device 

multiple exits (a normal loop exit and one or more early ' 7 j . " * J** * ^ - * 

■ t s ■ t , u . . , * tan u„;„„' 1040 represents any device suitable for servicing access 

exits) mto loops having a single exit. Current techniques, F , 7 . , , , , . a & , . 

however, often product transformed loops that are ineffi- re 9 ues * suc * f an miem J* b ^^ e ' a fl °P py ^ V f ^ 

cient and have high complexity. For these reasons, and for magneto-optical drive a CD ROM drive, a SuperDisk™ 

, u * * j u 1 l - l 11 l m , tn drive, a removable-cartridge hard drive such as a Zip™ 

other reasons stated below which will become apparent to „, . . ' A *L A . f. 

1 n j ■ *u _* j* a a 4 a ~ *u A 25 dnve, or even a tape drive. Operating system 1055 provides 

those skilled in the art upon reading and understanding the . * . . t r*. i- *• 

_ . tU f ra ,v „ na JA f nr „ ocont an interface by which one or more software applications 

present specification, there is a need for the present inven- A j • j X. j ■ 1 

r operate storage device 1040 in order to access the digital 

information held by computer-readable medium 1050. For 

SUMMARY OF THE INVENTION example, compiler 1060 interfaces with operating system 

. 30 1055 to generate machine instructions executable by CPU 

As explained in detail below, the invention is directed to 1010 According t0 me invention, compiler 1060 transforms 

the transformation of software loops having early exit con- software x b ^ eafly cxft ^^0^, 

ditions. In one embodiment the invention transforms the _ T _^ _ .„ A A . . , . 

software loop by assigning a predicate register for each early , ™- k 2 * g^eral software program 5 having 

j-*- c a. cZ i u 4 *u a ' * loop 7 that contains two early exits represented by blocks 20 

exit condition of the software loop such that the predicate 35 /_ _ _ _ T _ ^ - J * * ^ • i_i 1 -m j 

registers are set within the software loop when the corre- 30 A l » ™. 2, software program 5 starts in block 10 and 

spending early exit condition is satisfied. The predicate f ro % eds <? block U t wlu( ; h * the P™ 1 ^ < or P"*™**. ) ^ 

registers are examined after termination of the transformed loo P 7 an ? «*nictK>«» that are executed prior to 

loop in order to determine which early exit condition pre- «*??8 lo ?P T For , exam P le P ro S ram 5 ma y 

•f, ' mitiahze a loop counter within block 12. 

vailed. 40 r 

Next, software program 5 enters loop 7 by proceeding to 

BRIEF DESCRIPTION OF THE DRAWINGS block 15. Block 15 represents any instruction, or set of 

FIG.1 isablockdiagramshowingfunctionalcomponents ^(ructions, that is performed for each iteration of loop 7 

. . . •.if u- u u a-JL . t such as mcrementing the loop counter, etc. Block 20 of 

of the computer in conmnction with which embodiments ot . _ & A \, n A ' .... 

u *• a *e software program 5 represents the first early exit condition, 

the invention may be practiced; 45 „_ . r 6 _. . / L1 . - A . . ,r - 

„ f .„ . i^ When the condition of block 20 is true, software program 5 

FIG. 2 is a flowchart illustrating a general software ^ { ? cxecutes bbck 20A ^ ^ block 

program that has a loop with two early exits; 

FIG, 3 is a flowchart mustrating the software loop of FIG. ^ ^ ^ 2Q fa f prQ _ 

2 using predicated instructions; ^ gram 5 proceeds to block ^ ^ represents one or more 

FIG. 4 is a flowchart illustrating a conventional method instructions. Next, software program 5 executes the second 

for transforming loops having early exit conditions; early exU CODdition ^ block 30 . When the second early exit 

FIG. 5 is a flowchart illustrating the software program of condition is true, software program 5, executes block 30A 

FIG. 3 after transformation according to the method of FIG. and terminates with block 50. When the second early exit 

4; 55 condition is false, software program 5 executes block 35 and 

FIG. 6 is a flowchart illustrating an improved technique proceeds to block 40 which is referred to herein as the loop 

for transforming loops having early exit conditions; and branch for loop 7. In block 40 software program 5 deter- 

FIG. 7 is a flowchart illustrating the software program of mines whether to exit loop 7. If the loop exit condition is 

FIG. 3 after transformation according to the improved false, loop 7 is repeated. If loop exit is true then block 40A 

transformation method of FIG. 6. 60 15 executed and software program 5 terminates with block 

50. 

DETAILED DESCRIPTION M explained in detail below, the invention exploits cer- 

In the following detailed description, references are made tain characteristics of predicated instruction sets in order to 

to the accompanying drawings that illustrate specific improve loop transformation. In such an instruction set, 

embodiments in which the invention may be practiced. Hie 65 predicated instructions are executed only if a certain condi- 

following detailed description is not to be taken in a limiting tion is true, i.e., if the qualifying predicate register is set to 

sense and the scope of the invention is defined by the claims. one. For example, consider the following branch.instruction: 
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"(PI) branch 10". Here, PI is the predicate register, and the 
branch instruction is only executed if PI is true. The 
following pseudo code illustrates how loop 7 of FIG. 2 could 
be implemented with a predicated instruction set using 
predicate registers PI through P6: 



Al 


instruction #1 


A2 


cmp, pi, p2 - (A — B) 


A3 


(pi) branch to block 20A of RG. 2 


A4 


instruction #2 


AS 


cmp, p3, p4 - (B > C) 


A6 


(p3) branch to block 30A of FIG. 2 


A7 


instruction #3 


A8 


cmp p5, p6 - (DONE?) 


A9 


(p6) branch to line Al 



10 



In the above pseudo code, line Al executes instruction #1. 
Lines A2 and A3 implement the first early exit condition of 
FIG. 2, i.e., block 20. Line A2 compares A and B and sets 
PI to one and P2 to zero when A equals B and sets P2 to one 
and PI to zero when A is not equal to B. line A3 is a 
predicated instruction, i.e., the branch statement to block 
20 A is only executed if predicate register PI is set to one. 
Otherwise, control flows to hoe A4, which executes instruc- 
tion #2. Lines A5 and A6 operate similarly to implement the 
second early exit condition of block 30. Line A8 tests 
whether the loop is finished and sets P5 and P6 accordingly. 
Line A9 branches to line Al (Block 15 of FIG. 2) if P6 is set 
to one, i.e., loop 7 is not finished. FIG. 3 is a flowchart 
illustrating software program 5 of FIG. 2 as implemented 
using predicated instructions as described above. 

One conventional approach for transforming loops having 
multiple exits to a loop with a single exit is described by 
Tirumalai, et al. in "Parallelization of Loops With Exits on 
Pipelined Architectures", Supercomputing Conference, Dec. 
1990, pages 200-212. According to this approach, a register 
is used to record the prevailing exit condition. After the loop 
terminates, the register is examined in order to determine 
which exit condition was satisfied. Based on which exit 
condition exists, the software program takes any necessary 
corrective action. 

After this transformation, the loop has a single exit but 
still consists of multiple basic blocks. It can be transformed 
into a loop with a single basic block using a known tech- 
nique such as "if-conversion". Those skilled in the art will 
know that if-conversion on a set of basic blocks removes 
branches by appropriately predicating instructions in such 
blocks. 

FIG; 4 is a flowchart 200 illustrating in more detail the 
Tirumalai method for transforming loops having early exit 
conditions. The transformation method starts in step 202 and 
proceeds to step 205. In step 205, the method introduces 
code to initialize a register (R) to a predetermined value such 
as zero. This register is used to record the prevailing exit 
condition for loop 7. Next, as illustrated in step 210, the 
method creates a new loop branch for loop 7. This new loop 
branch determines whether an exit condition has been met 
by checking whether R equals 0. If no condition has been 
met, the new loop branch jumps to the top of loop 7. In step 
215, the transformation method creates a new target block 
for each early exit and for the original loop branch. These 
target blocks write the register (R) in order to record which 
exit condition has been met. In step 220 the method modifies 
the original loop branch to jump to one of the new target 
blocks instead of the top of the loop. In step 225 the method 
creates a series of branches that are executed after the loop 



20 



25 



terminates. These branches examine the register and jump to 
the original destinations of the early exits. Finally, in step 
227 the compiler converts the transformed loop into a loop 
having a single basic block using a known technique such as 
"if-conversion". 

FIG. 5 illustrates the Tirumalai transformation method as 
applied to software program 5 of FIG. 2, thereby resulting in 
software program 5' having transformed loop T. According 
to step 205 of FIG. 4, block 12 has been modified such that 
the register (R) is initialized to zero. According to step 210 
of the method, new block 55 is created which sets predicate 
registers P7 and P8 based on a comparison between R and 
zero. Thus, if R is zero then P7 is set and software program 
5' branches to block 15. According to step 215, the method 
creates a series of new target blocks 20B, 30B and 40B that 
modify R in order to record the prevailing exit condition of 
loop 7\ Next, the method modifies the original loop branch, 
block 40 of FIG, 3, such that the branch jumps to new target 
block 40B when the loop is done and otherwise falls through 
to block 55. Finally, according to step 225 of FIG. 4, the 
method adds block 60 that examines the register and jumps 
to original exit blocks 20A, 30A or 40A depending on the 
exit condition. 

The following pseudocode is one example of how a 
compiler could convert the transformed loop T of FIG. 5 
into a single basic block: 



30 



35 



Bl 


instruction #1 


B2 


cmp pi, p2 = (A «» B) 


B3 


(pi) R - 1 


B4 


(p2) instruction #2 


B5 


(p2) cmp. unc p3, p4 = ( 


B6 


(p3) R = 2 


B7 


(p4) instruction #3 


BS 


(p4) cmp. unc p5 t p6 = ( 


B9 


(p5) R = 3 


B10 


cmp p7, p8 = (R « 0) 


Bll 


(p7) branch to line Bl 



(B>C) 



40 As illustrated in the above pseudocode, for a software 
loop having N exits, the Tirumalai approach requires N new 
instructions for setting the value of the register R. These 
additions, as well as the addition of an extra compare 
(instruction B10), lead to an inefficient conversion of loop 7 

45 to a single exit loop. 

A common metric that indicates the efficiency of a 
software -pipelined loop is known as the initiation interval 
(II), which is the interval between the start of two successive 
iterations.of a software-pipelined loop. II is bounded from 

50 below by the maximum of Resourcell and Recurrencell. 
Resourcel! is determined by the number of instructions in 
the loop. Recurrencell is determined by the circular chain of 
dependences in the loop. 

The resourcell of the loop in the above pseudocode could 

55 potentially be increased by the addition of the four new 
instructions B3, B6, B9, and BIO. This loop has a minimum 
recurrence II of 5 cycles as represented by the following 
circular chain of dependences between instructions 
B10-*B2-»B5-*B8-»B9-*B10, assuming that each 

60 instruction requires one cycle to execute. Those skilled in 
the art will realize that B10-*B2 is a control dependence 
edge while the others are data dependence edges. In other 
words, the minimum cycles that are necessary between the 
start of successive iterations is determined by the depen- 

65 dence chain through compare instructions B2, B5, and B8, 
setting the register R in instruction B9 and executing the 
comparison in instruction B10. 
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FIG. 6 is a flowchart 300 illustrating one embodiment of 
the inventive transformation method for transforming loops 
having early exit conditions. This inventive technique 
exploits certain characteristics of predicated instruction sets 
in order to improve loop transformation. For example, the 
invention uses predicate registers to record the prevailing 
exit condition. After the loop terminates, the predicate 
registers are examined in order to determine which exit 
condition was satisfied. As will be apparent to one skilled in 
the art after reading the discussion below, the invention 
enables an optimizing compiler to more efficiently pipeline 
the transformed loops. 

The inventive transformation method 300 starts in step 
302 and proceeds to step 305. In step 305, the inventive 
transformation method assigns a predicate register to the 
loop branch and initializes that predicate register to zero. 
Thus, loop 7 is initially set to terminate at the beginning of 
each iteration of loop 7. The method similarly assigns and. 
initializes predicate registers for each early, exit to zero. The 
primary reason for initializing predicate registers for the 
early exits is to ensure that such predicate registers do not 
have garbage values upon exit from the loop. 

In step 310, the method creates a new bottom block and 
moves the loop branch into the new bottom block. In other 
words; the comparison for the loop branch is left unchanged 
but the actual jump back to the beginning of loop 7 is moved 
to this new block. In step 315, the inventive transformation 
method modifies the target blocks for each early exit such 
that they jump to the new bottom block. In step 325 the 
method creates a series of branches in the epilog after the 
loop. These branches examine the predicate registers for the 
early exits and jump to the original destinations of the early 
exits. 

In step 327 a compiler converts the transformed loop into 
a loop having a single basic block using a known technique 
such as "if -conversion". Finally, in step 329, the compiler 
removes instructions by replacing the initializations of 
predicate registers and the corresponding conditional com- 
pares with unconditional compares, where possible. For 
example, conditional compares that dominate the loop exit 
in the original loop can be optimized this way. 

An unconditional compare and conditional compare differ 
only when the qualifying predicate register is zero. In such 
cases, the unconditional compare clears both target predicate 
registers, whereas the conditional compare leaves both tar- 
get predicate registers unchanged. Using unconditional com- 
pares obviates the need for initializing predicate registers for 
early exits to zero in the loop entry. 

FIG. 7 illustrates software program 5 of FIG. 2 after 
transformation according to the inventive method, thereby 
resulting in software program 5" having loop 7". According 
to step 305 of FIG. 6, block. 15has been modified such that 
the predicate registers used by the loop branch and the early 
exits are initialized to zero. According to step 310 of the 
method, a new block 40" is created and the original loop 
branch is moved from block 40 to the new block 40". The 
original comparison, however, remains in block 40". 
According to step 315, the targets for early exits in blocks 20 
and 30 have been set to the new block 40". Finally, accord- 
ing to step 325 of FIG. 6, the method adds block 60 that 
determines whether the predicate registers for the early exits 
are set and accordingly proceeds to blocks 20A, 30Aor 40 A. 

The following pseudo code is one example of how a 
compiler could convert the transformed loop T of FIG. 7 
into a single basic block: 



Bl 

6 



s 



CI 


instruction #1 


C2 


cmp.unc pi, p2 = (A ™ B) 


C3 


(p2) instruction Wl 


C4 


(p2) cmp.unc p3, p4 » (B > C) 


C5 


(p4) instruction #3 


C6 


(p4) cmp.unc p5, p6 - (DONE?) 


C7 


(p6) branch line CI 



10 

In the above pseudo code, line CI simply executes 
instruction #1. Note, the compiler has removed the initial- 
ization of predicate registers PI, P3 and P6. Line C2 
implements the first early exit condition of FIG. 2, i.e., block 

15 20. Line C2 sets PI to one and P2 to zero when A equals B 
and sets P2 to one and PI to zero when A does not equal B. 
Line C3 is a predicated instruction that executes instruction 
#2 when P2 is set. Line C4 is also predicated by P2 and 
implements the second early exit condition of FIG. 2, i.e., 

20 block 30. More specifically, if the qualifying predicate 
register P2 is one, line C4 sets P3 to one and P4 to zero if 
B is greater than C and sets P4 to one and P3 to zero if B is 
not greater than C. If P2 is zero, it clears both P3 and P4. 
Line C6 tests whether the loop is finished and sets P5 and P6 

25 accordingly. line C7 branches to line CI (Block 15 of FIG. 
2) when P6 is set, i.e., loop 7 is not finished. 

The pseudocode resulting from the inventive transforma- 
tion method has four fewer instructions than the pseudocode 
resulting from the conventional method. This reduces the 

30 resourcell for the loop. In, addition, the minimum recurrence 
II of this approach is only 3 cycles, resulting from the 
circular chain of dependences: C6-»C2->C4-*C6. This is a 
significant improvement over the 5 cycles of the conven- 
tional method. 

35 Various embodiments of the invention have been 
described that transform software loops having early exit 
conditions. Several advantages of the invention have been 
illustrated. For example, the resulting loops have a lower 
recurrence II and a lower resource II than conventional 

40 techniques. The present invention enables an optimizing 
compiler to more efficiently pipeline the transformed loops. 
It is intended that only the claims and equivalents thereof 
limit this invention. 
We claim: 

45 1. A computer-implemented method for transforming a 
software loop having one or more early exits comprising: 
assigning a predicate register for each early exit of a 

software loop; 
setting the assigned predicate register when the corre- 
sponding early exit condition is satisfied; and 
examining the assigned predicate registers when the soft- 
ware loop terminates to determine which early exit 
conditions are satisfied. 
5S 2. The method of claim 1 and further comprising: 

assigning a predicate register for a loop branch that 

controls whether the software loop is repeated; 
initializing the predicate register controlling the loop 
branch such that loop branch defaults to exiting the 
60 loop; and 

modifying the early exits to jump to the loop branch. 

3. The method of claim 1 and further comprising: 
initializing the predicate registers for each early exit 

condition to ensure that the predicate registers are 
65 defined for each loop iteration. 

4. The method of claim 2, wherein the predicate registers 
are initialized for each of the software loop. 
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5. The method of claim 1 and further including replacing 
at least one conditional compare that dominates a normal 
exit of the loop with a corresponding unconditional com- 
pare. 

6. A software compiler stored on a computer-readable 
medium for a computer having a predicated instruction set, 
wherein the compiler when executed by the computer trans- 
forms a software loop having one or more early exit con- 
ditions by generating computer-executable instructions to 
perform the method comprising: 

assigning a predicate register for each early exit condition 
of a software loop; 

setting the assigned predicate register when the corre- 
sponding early exit condition is satisfied; and 

examining the assigned predicate registers when the soft- 
ware loop terminates to determine which early exit 
conditions are satisfied. 

7. The software compiler of claim 6 further generating 
computer-executable instructions to perform: 

assigning a predicate register for a loop branch that 
controls whether the software loop is repeated; 

initializing the predicate register controlling the loop 
branch such that the loop branch defaults to exiting the 
software loop; and 

modifying early exits to jump to the loop branch. 

8. The software compiler of claim 6 further generating 
computer-executable instructions to initialize the predicate 
registers assigned for each early exit condition. 

9. The software compiler of claim 8, wherein the predicate 
initialization instructions are executed for each iteration of 
the software loop. 

10. The software compiler of claim 6 further including 
generating computer-executable instructions to replace at 
least one conditional compare that dominates a normal exit 
of the loop with a corresponding unconditional compare. 

11. A computer comprising a plurality of predicate 
registers, wherein one of the predicate registers is allocated 
for each early exit condition of a software loop executing on 



15 



20 



25 



30 



35 



12. The computer of claim 11 further comprising a 
predicate register allocated for a loop branch that controls 
whether the software loop is repeated, wherein the. predicate 
register controlling the loop branch is initialized such that 
the loop branch defaults to exiting the loop. 

13. The computer of claim 12, wherein the predicate 
registers allocated to the early exit conditions are initialized 
for each iteration of the loop. 

14. The computer of claim 13, wherein the predicate 
register allocated to the loop branch is initialized for each 
iteration of the loop. 

15. A computer-readable medium having computer- 
executable instructions to cause a computer to transform a 
software loop by performing the method of: 

assigning a predicate register for each early exit condition 
of a software loop; 

setting the assigned predicate register when the corre- 
sponding early exit condition is satisfied; and 

examining the assigned predicate registers when the soft- 
ware loop terminates to determine which early exit 
conditions have been satisfied. 

16. The computer-readable medium of claim 15 further 
including computer-executable instructions to cause a com- 
puter to further perform the method: 

assigning a predicate register for a loop branch that 

controls whether the software loop is repeated; 
initializing the predicate register controlling the loop 

branch such that the loop branch defaults to exiting the 

software loop; and 
modifying early exit conditions to jump to the loop 

branch. 

17. The computer-readable medium of claim 16 further 
including computer-executable instructions to cause a com- 
puter to initialize the predicate registers for each iteration of 
the loop. 

18. The computer-readable medium of claim 15 further 
including computer-executable instructions to replace at 



the computer, and further wherein each predicate register is 40 least one conditional compare that dominates a normal exit 



set when the corresponding early exit condition is satisfied 
and is examined when the loop terminates to determine 
which early exit conditions have been satisfied. 



of the loop with a corresponding unconditional compare. 
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